首页> 外文OA文献 >CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

【2h】

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

机译：CDC：用于精确时间行动的卷积 - 反卷积网络未修剪视频中的本地化

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Temporal action localization is an important yet challenging problem. Given along, untrimmed video consisting of multiple action instances and complexbackground contents, we need not only to recognize their action categories, butalso to localize the start time and end time of each instance. Manystate-of-the-art systems use segment-level classifiers to select and rankproposal segments of pre-determined boundaries. However, a desirable modelshould move beyond segment-level and make dense predictions at a finegranularity in time to determine precise temporal boundaries. To this end, wedesign a novel Convolutional-De-Convolutional (CDC) network that places CDCfilters on top of 3D ConvNets, which have been shown to be effective forabstracting action semantics but reduce the temporal length of the input data.The proposed CDC filter performs the required temporal upsampling and spatialdownsampling operations simultaneously to predict actions at the frame-levelgranularity. It is unique in jointly modeling action semantics in space-timeand fine-grained temporal dynamics. We train the CDC network in an end-to-endmanner efficiently. Our model not only achieves superior performance indetecting actions in every frame, but also significantly boosts the precisionof localizing temporal boundaries. Finally, the CDC network demonstrates a veryhigh efficiency with the ability to process 500 frames per second on a singleGPU server. We will update the camera-ready version and publish the sourcecodes online soon.

机译：时间行为的本地化是一个重要但具有挑战性的问题。考虑到由多个动作实例和复杂背景内容组成的未修剪视频，我们不仅需要识别它们的动作类别，还需要定位每个实例的开始时间和结束时间。许多最先进的系统使用段级分类器来选择和划分预定边界的建议段。但是，理想的模型应该超越段级别，并及时以细粒度进行密集预测，以确定精确的时间边界。为此，我们设计了一个新颖的卷积反卷积（CDC）网络，该网络将CDCfilters置于3D ConvNets之上，已被证明对于抽象化动作语义有效，但可以减少输入数据的时间长度。同时进行所需的时间上采样和空间下采样操作，以预测帧级粒度下的操作。在时空和细粒度的时间动力学中联合建模动作语义方面，它是独一无二的。我们以端到端的方式有效地培训CDC网络。我们的模型不仅在每个帧中实现了出色的检测动作性能，而且还大大提高了定位时间边界的精度。最后，CDC网络具有很高的效率，能够在单个GPU服务器上每秒处理500帧。我们将更新可用于相机的版本，并尽快在线发布源代码。

著录项

作者
Shou, Zheng; Chan, Jonathan; Zareian, Alireza; Miyazawa, Kazuyuki; Chang, Shih-Fu;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. A two-stage temporal proposal network for precise action localization in untrimmed video [J] . Wang Fei, Wang Guorui, Du Yuxuan, International journal of machine learning and cybernetics . 2021,第8期

机译：一个两阶段时间建议网络，用于未经监测视频中的精确行动定位
2. Temporal Action Localization in Untrimmed Videos Using Action Pattern Trees [J] . Song Hao, Wu Xinxiao, Zhu Bing, IEEE transactions on multimedia . 2019,第3期

机译：使用动作模式树在未修剪视频中进行时间动作本地化
3. Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation [J] . Le Wang, Xuhuan Duan, Qilin Zhang, Sensors . 2018,第5期

机译：Segment-Tube：具有按帧分割的未修剪视频中的时空行为本地化
4. CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos [C] . Zheng Shou, Jonathan Chan, Alireza Zareian, IEEE Conference on Computer Vision and Pattern Recognition . 2017

机译：CDC：卷积解卷积网络，用于未修剪视频中的精确时间动作本地化
5. Generating Temporal Action Proposals in Long Untrimmed Videos [D] . Vaishnavi, Pratik 2018

机译：在未修剪的长视频中生成时间动作建议
6. Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation [O] . Le Wang, Xuhuan Duan, Qilin Zhang, 2018

机译：Segment-Tube：具有按帧分割的未修剪视频中的时空行为本地化
7. Scale Matters: Temporal Scale Aggregation Network For Precise Action Localization In Untrimmed Videos [O] . Guoqiang Gong, Liangfeng Zheng, Yadong Mu 2020

机译：规模事项：未经监测视频中精确行动定位的时间缩放聚合网络

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅